Integration of Clinical and Genomic data: a Methodological Survey

نویسندگان

  • Sanjoy Dey
  • Rohit Gupta
  • Michael Steinbach
  • Vipin Kumar
چکیده

Human diseases are inherently complex and governed by the complicated interplay of several underlying factors. Clinical research focuses on behavioral, demographic and pathology information, whereas molecular genomics focuses on finding underlying genetic and genomic factors in genomic data collected on mRNA expression, proteomics, biological networks, and other microbiological features. However, each of these clinical and genomic datasets contains information only about one particular aspect of a complex disease, rather than covering all of the several complicated underlying risk factors. This has led to a new area of research that integrates both clinical and genomic data and aims to extract more information about diseases by considering not only all the various factors, but also the interactions among those factors, which cannot be captured by clinical and genomic studies that are performed independently of each other. Although initial efforts have already been made to develop such integrative modeling of the clinical and genomic data to shed light on the biological mechanism of the diseases, the research field is still in a rudimentary stage. In this review article, we survey the general issues, challenges and current work of clinicogenomic studies. We also summarize the current state of the field and discuss some possibilities for future work. Until the last decade, traditional clinical care and management of complex diseases mainly relied on different clinico-pathological data, such as signs and symptoms, demographic data, pathological lab test results, and medical images. In addition, efforts have been made to capture genetic factors by maintaining the family history of patients. The effect of such clinical and histo-pathological markers is assessed by cohort based studies conducted on large populations (Szklo 1998) and the knowledge obtained from these studies is summarized as clinical guidelines for the diagnosis, prognosis, monitoring and treatment of human disease, e.g., NPI (Galea, Blamey et al. 1992) and Adjuvant! Online (Goldhirsch, Coates et al. 2006) for breast cancer and palmOne (Blumberg 2004) for prostate cancer. However, this approach still falls short. For example, there are adverse drug reactions for some patients who have risk factors similar to those patients who have been cured by the same therapeutic treatment. This LVVXH VWHPV IURP WKH VWUDWHJ\ RI μRQH GUXJ ILWV DOO¶ and motivates the need to improve on conclusions drawn from cohort-based studies so that the underlying mechanism of complex diseases can be understood at the individual patient level. The recent advancement of high-throughput technology has led to an abundance of information for each individual at the micro-molecular level. A myriad of genetic, genomic and metabolomics data have been collected to capture different aspects of cell mechanism that shed light on human physiology. Examples include SNPs, which provide information about the genetic polymorphism of an individual; gene expressions, which measure transcription; and protein and metabolite abundance, which captures protein abundance and posttranslational modifications. These high-throughput datasets have helped answer some complex biological questions for different diseases, such as assessing the prognosis effect (Sotiriou and Piccart 2007),(Driouch, Landemaine et al. 2007),(Potti, Mukherjee et al. 2006),(Garber, Troyanskaya et al. 2001), epistasis effects on diseases (Anastassiou 2007), and discovering new sub-phenotypes of complex diseases (Golub, Slonim et al. 1999),(Alizadeh, Eisen et al. 2000),(Bhattacharjee, Richards et al. 2001). The use of genetic information in epidemiology helped design effective diagnostics, new therapeutics, and novel drugs which have led to the recent era of personalized medicine (genomic medicine) (Stephenson, Smith et al. 2005), (Edén, Ritz et al. 2004), (Teschendorff, Naderi et al. 2006). However, these genetic factors alone cannot explain all the intricacies of complex diseases. For example, the incidences of cancer vary widely among different countries due to the environmental factors, even for the same ethnic groups, when they migrate from one country to another (Redmond Jr 1970), (Weinberg 2007). In recent studies (Schadt 2009, Eichler, Flint et al. 2010), it has been hypothesized that most complex diseases are caused by the combined effects of many diverse factors, including different genetic, genomic, behavioral factors and environmental effects. For example, cancer, which is the most widely studied disease 2 phenotype in last few decades, is extremely heterogeneous. Different clinical endpoints of cancer, such as the idiosyncrasy of individual tumors, the survival rate of cancer patients after chemotherapy or surgical treatment, development of metastasis, and the effectiveness of drug therapy are governed by different risk factors including multiple mutations of genetic factors (e.g., RAS, RTK, TGF:QW VLJQDOLQJ SDWKZD\V behavioral factors (e.g. tobacco exposure, diet, lifestyle) (Weinberg 2007), long-time environmental effects (e.g., stresses, temperature, radiation, oxygen tensions, hydration and tonicity, microand macro-nutrients, toxins) (Loscalzo, Kohane et al. 2007) and inherent germline variations (e.g. BRCA1/2) (West, Ginsburg et al. 2006). Therefore, clinico-pathological and genomic datasets capture different effects of all such diverse factors on complex diseases in a complementary manner rather than a supplementary nature. Using the two diverse perspectives provided by both types of data can potentially reveal disease complexities in greater details. In addition, the individual effects of each of the clinicogenomic factors on disease predisposition can be small and thus can remain undetected by most disease association techniques performed on individual datasets. However, interactions among those individual factors may be responsible for increasing the risk of complex disease (Anastassiou 2007),(Loscalzo, Kohane et al. 2007). For example, neither a gene nor an environmental factor like tobacco use may be significantly associated with lung cancer by itself, but together they can increase the risk significantly (Zhou, Liu et al. 2002). In a more complicated scenario, a complex genetic network can evolve dynamically under various environmental factors (Schadt 2009). This phenomenon is true even for Mendelian disease with monogenic disorder like sickle cell disease, where single different phenotypes were observed based on environmental effects (Kato, Gladwin et al. 2007). Besides interactions, there may be other types of relationships such as causal relationships between two types of markers (Lê Cao, Meugnier et al. 2010). For example, some pathological variables such as PSA, can also have upstream genetic influence (Singh, Febbo et al. 2002). In this case, the individual factors coming from different datasets may not be a strong biomarker; but rather the relationships inherent among them, including interaction, can act as potential biomarkers. Leveraging such wider relationships including interactions, correlation and casualty among the genomic, pathological, environmental, and behavioral factors is important for understanding the nature of diseases. It will also assist in making better clinical decisions. For example, surgery can be avoided if some causative genomic markers of the tumor can be targeted in the early stage of breast cancer. Note that if the association of clinical and genomic factors with the disease phenotype is assessed independently, such deeper levels of relationships among data sources cannot be discovered. It is essential to build integrative models considering both genomic and clinical variables simultaneously being cognizant of the interaction, redundancy, and correlation among those clinical and genomic data (Schadt 2009). This has led to an emerging research area of integrative studies of clinical and genomic data, which we will refer as clinico-genomic integration. In this review, we survey not only different issues and challenges existing in such clinico-genomic integrative studies, but also different approaches that aimed to address those issues. Finally, we conclude with a general discussion on future research directions in this topic. Clinicogenomic integration means building models by integrating clinical and genomic data. Clinical data refers WR D EURDG FDWHJRU\ RI SDWLHQW¶V SDWKRORJLFDO EHKDYLRUDO GHPRJUDSKLF IDPLOLDO environmental and medication history, while genomic data refers to DQ\ NLQG RI SDWLHQW¶V JHQHWLF information including SNPs, gene expression, protein and metabolite profiles [Figure 2]. More specifically, the clinicogenomic studies should have at least one clinical dataset and one genomic dataset for a group of people who are assessed for an outcome of a phenotype of a disease. Furthermore, we survey only integrative models with an emphasis on biomarker discovery. Therefore, each sample of datasets is assessed for a particular disease phenotype. The phenotype can be either binary class labels such as cancer vs. no cancer, tumor vs. normal tissue samples, metastasis vs. nonrecurrent cancer, or continuous variables, e.g., the survival time after chemotherapy or other types of therapeutic treatments. Achieving the goal of biomarker discovery requires identifying the clinical and genomic features from the data that are significantly associated with the disease phenotype. Integration of diverse biomedical datasets is a vast research topic and has been studied widely in many different domains. Although some initial efforts have been made by researchers for clinicogenomic integration,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Qualitative Descriptive Approach in International Comparative Studies: Using Online Qualitative Surveys

International comparative studies constitute a highly valuable contribution to public policy research. Analysing different policy designs offers not only a mean of knowing the phenomenon itself but also gives us insightful clues on how to improve existing practices. Although much of the work carried out in this realm relies on quantitative appraisal of the data contained in international databa...

متن کامل

بررسی نگرش دانشجویان پزشکی قزوین نسبت به برنامه ادغام درس فارماکولوژی در مقطع فیزیوپاتولوژی

Background and Objective: Integration has been accepted as an important educational strategy in order to decrease unnecessary repetitions of the contents in traditional educational systems. The purpose of this study was to study student’s opinion about the quality of Pharmacology integrated with Physiopathology. Material and Methods: This cross-sectional survey was conducted with 36 medical st...

متن کامل

Methodological Review Data integration and genomic medicine

Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means data sets are large and highly heterogeneous. To create knowledge from data, researchers must integrate these large and diverse data sets. This presents daunting informatic challenges such as representation of data that...

متن کامل

برآورد صحت انتخاب ژنومی در جوامع کوچک ژنتیکی- مطالعه‌ شبیه‌سازی

In the present study two genetically connected small and large populations were simulated and the effect of different sources of information from foreign populations on the accuracy of predicted genomic breeding values of young animals of the small population was investigated. A large population consist of 200000 animals over 15 generations and a small population consist of 5000 animals over 3 ...

متن کامل

Molecular Epidemiology of Adenoviruses among Respiratory Infected Patients

Abstract Background and Objective: Respiratory tract infections (RTI) are the most common infectious disorders, worldwide. About 80%-90% of RTI are caused by four viruses such as Adenoviruses, 51 serotypes have been introduced so far. The aim of this survey was to evaluate the frequency of Adenovirus in respiratory infected patients by PCR method in Golestan province, Iran. Material and Me...

متن کامل

A Comparative Study of Class Activities and Students’ Expectations of IELTS and TOEFL iBT Preparation Courses: A Methodological Triangulation Washback Study

Washback refers to the influence of a test on teaching and learning. This study was an attempt to compare the influence of IELTS and TOEFL iBT on the expectations the students brought to their courses and to investigate how these expectations were fulfilled. To this end, 100 IELTS and 120 TOEFL iBT students attending preparation courses took a questionnaire survey, and a sample of their ten cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013